Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
Chen, Ke; Shao, Mingfu(
, Algorithms for Molecular Biology)
AbstractBackground
Many bioinformatics applications involve bucketing a set of sequences where each sequence is allowed to be assigned into multiple buckets. To achieve both high sensitivity and precision, bucketing methods are desired to assign similar sequences into the same bucket while assigning dissimilar sequences into distinct buckets. Existingk-mer-based bucketing methods have been efficient in processing sequencing data with low error rates, but encounter much reduced sensitivity on data with high error rates. Locality-sensitive hashing (LSH) schemes are able to mitigate this issue through tolerating the edits in similar sequences, but state-of-the-art methods still have large gaps.
Results
In this paper, we generalize the LSH function by allowing it to hash one sequence into multiple buckets. Formally, a bucketing function, which maps a sequence (of fixed length) into a subset of buckets, is defined to be$$(d_1, d_2)$$-sensitive if any two sequences within an edit distance of$$d_1$$are mapped into at least one shared bucket, and any two sequences with distance at least$$d_2$$are mapped into disjoint subsets of buckets. We construct locality-sensitive bucketing (LSB) functions with a variety of values of$$(d_1,d_2)$$and analyze their efficiency with respect to the total number of buckets needed as well as the number of buckets that a specific sequence is mapped to. We also prove lower bounds of these two parameters in different settings and show that some of our constructed LSB functions are optimal.
Conclusion
These results lay the theoretical foundations for their practical use in analyzing sequences with high error rates while also providing insights for the hardness of designing ungapped LSH functions.
Modern methods for computation-intensive tasks in sequence analysis (e.g. read mapping, sequence alignment, genome assembly, etc.) often first transform each sequence into a list of short, regular-length seeds so that compact data structures and efficient algorithms can be employed to handle the ever-growing large-scale data. Seeding methods using kmers (substrings of length k) have gained tremendous success in processing sequencing data with low mutation/error rates. However, they are much less effective for sequencing data with high error rates as kmers cannot tolerate errors.
Results
We propose SubseqHash, a strategy that uses subsequences, rather than substrings, as seeds. Formally, SubseqHash maps a string of length n to its smallest subsequence of length k, k < n, according to a given order overall length-k strings. Finding the smallest subsequence of a string by enumeration is impractical as the number of subsequences grows exponentially. To overcome this barrier, we propose a novel algorithmic framework that consists of a specifically designed order (termed ABC order) and an algorithm that computes the minimized subsequence under an ABC order in polynomial time. We first show that the ABC order exhibits the desired property and the probability of hash collision using the ABC order is close to the Jaccard index. We then show that SubseqHash overwhelmingly outperforms the substring-based seeding methods in producing high-quality seed-matches for three critical applications: read mapping, sequence alignment, and overlap detection. SubseqHash presents a major algorithmic breakthrough for tackling the high error rates and we expect it to be widely adapted for long-reads analysis.
Availability and implementation
SubseqHash is freely available at https://github.com/Shao-Group/subseqhash.
Road safety has always been a crucial priority for municipalities, as vehicle accidents claim lives every day. Recent rapid improvements in video collection and processing technologies enable traffic researchers to identify and alleviate potentially dangerous situations. This paper illustrates cutting-edge methods by which conflict hotspots can be detected in various situations and conditions. Both pedestrian–vehicle and vehicle–vehicle conflict hotspots can be discovered, and we present an original technique for including more information in the graphs with shapes. Conflict hotspot detection, volume hotspot detection, and intersection-service evaluation allow us to understand the safety and performance issues and test countermeasures comprehensively. The selection of appropriate countermeasures is demonstrated by extensive analysis and discussion of two intersections in Gainesville, Florida, USA. Just as important is the evaluation of the efficacy of countermeasures. This paper advocates for selection from a menu of countermeasures at the municipal level, with safety as the top priority. Performance is also considered, and we present a novel concept of a performance–safety trade-off at intersections.
Chen, Ke; Shao, Mingfu(
, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022))
Boucher, Christina; Rahmann, Sven
(Ed.)
Many bioinformatics applications involve bucketing a set of sequences where each sequence is allowed to be assigned into multiple buckets. To achieve both high sensitivity and precision, bucketing methods are desired to assign similar sequences into the same bucket while assigning dissimilar sequences into distinct buckets. Existing k-mer-based bucketing methods have been efficient in processing sequencing data with low error rate, but encounter much reduced sensitivity on data with high error rate. Locality-sensitive hashing (LSH) schemes are able to mitigate this issue through tolerating the edits in similar sequences, but state-of-the-art methods still have large gaps. Here we generalize the LSH function by allowing it to hash one sequence into multiple buckets. Formally, a bucketing function, which maps a sequence (of fixed length) into a subset of buckets, is defined to be (d₁, d₂)-sensitive if any two sequences within an edit distance of d₁ are mapped into at least one shared bucket, and any two sequences with distance at least d₂ are mapped into disjoint subsets of buckets. We construct locality-sensitive bucketing (LSB) functions with a variety of values of (d₁,d₂) and analyze their efficiency with respect to the total number of buckets needed as well as the number of buckets that a specific sequence is mapped to. We also prove lower bounds of these two parameters in different settings and show that some of our constructed LSB functions are optimal. These results provide theoretical foundations for their practical use in analyzing sequences with high error rate while also providing insights for the hardness of designing ungapped LSH functions.
Intercalated layered materials offer distinctive properties and serve as precursors for important two-dimensional (2D) materials. However, intercalation of non–van der Waals structures, which can expand the family of 2D materials, is difficult. We report a structural editing protocol for layered carbides (MAX phases) and their 2D derivatives (MXenes). Gap-opening and species-intercalating stages were respectively mediated by chemical scissors and intercalants, which created a large family of MAX phases with unconventional elements and structures, as well as MXenes with versatile terminals. The removal of terminals in MXenes with metal scissors and then the stitching of 2D carbide nanosheets with atom intercalation leads to the reconstruction of MAX phases and a family of metal-intercalated 2D carbides, both of which may drive advances in fields ranging from energy to printed electronics.
Banerjee, Tania; Chen, Ke; Almaraz, Alejandro; Sengupta, Rahul; Karnati, Yashaswi; Grame, Bryce; Posadas, Emmanuel; Poddar, Subhadipto; Schenck, Robert; Dilmore, Jeremy; et al(
, Proceedings of 2022 IEEE International Intelligent Transportation Systems Conference (ITSC),)
As a part of road safety initiatives, surrogate road safety approaches have gained popularity due to the rapid advancement of video collection and processing technologies. This paper presents an end-to-end software pipeline for processing traffic videos and running a safety analysis based on surrogate safety measures. We developed algorithms and software to determine trajectory movement and phases that, when combined with signal timing data, enable us to perform accurate event detection and categorization in terms of the type of conflict for both pedestrian-vehicle and vehicle-vehicle interactions. Using this information, we introduce a new surrogate safety measure, “severe event,” which is quantified by multiple existing metrics such as time-to-collision (TTC) and post-encroachment time (PET) as recorded in the event, deceleration, and speed. We present an efficient multistage event filtering approach followed by a multi-attribute decision tree algorithm that prunes the extensive set of conflicting interactions to a robust set of severe events. The above pipeline was used to process traffic videos from several intersections in multiple cities to measure and compare pedestrian and vehicle safety. Detailed experimental results are presented to demonstrate the effectiveness of this pipeline.
A selenophene-containing conjugated organic ligand, 2-(4′-methyl-5′-(5-(3-methylthiophen-2-yl)selenophen-2-yl)-[2,2′-bithiophen]-5-yl)ethan-1-aminium (STm), was synthesized and incorporated into a Sn( ii )-based two-dimensional perovskite, (STm) 2 SnI 4 . The band offset between the perovskite and ligand can be fine-tuned by introducing the STm ligand. Both field-effect transistor and light-emitting diode devices based on (STm) 2 SnI 4 films exhibit high performance and enhanced operational stability.
Second sound refers to the phenomenon of heat propagation as temperature waves in the phonon hydrodynamic transport regime. We directly observe second sound in graphite at temperatures of over 200 K using a sub-picosecond transient grating technique. The experimentally determined dispersion relation of the thermal-wave velocity increases with decreasing grating period, consistent with first-principles-based solution of the Peierls-Boltzmann transport equation. Through simulation, we reveal this increase as a result of thermal zero sound—the thermal waves due to ballistic phonons. Our experimental findings are well explained with the interplay among three groups of phonons: ballistic, diffusive, and hydrodynamic phonons. Our ab initio calculations further predict a large isotope effect on the properties of thermal waves and the existence of second sound at room temperature in isotopically pure graphite.
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.